function, such as

ߝൌ1

ܰ෍ሺݕො

െݕ

௡ୀଵ

(3.33)

ting all model parameters using a vector notation w and a data

ܠ, the estimation of w is an optimisation process

ܟෝൌmin

ୟ୰୥൝1

ܰ෍ሺ݂ሺܠ, ܟෝሻെݕ

௡ୀଵ

(3.34)

use this ݂ሺܠ, ܟෝሻ is a nonlinear function, there will not be an

format of the estimated model parameters such as the one used in

odels. Therefore, an algorithm called the backward propagation

m has been developed to estimate parameters for a MLP model

art, et al., 1986]. The backward propagation algorithm is a variant

ewton’s method [Fletcher, 1987]. It is based on the derivative

of an error function, which is defined in an equation shown

here 0 ൏ߟ൏1 is called the learning rate,

Δܟൌെߟ׏ߝ

׏ܟ

(3.35)

se of the above equation means that the change (update) of every

el parameter is negatively proportional to the derivative function

rror function, ܟ௧ାଵൌܟെߟߘߝߘܟ.

A great derivative

points to a steep or sharp segment curve of the error function.

e, a cautious update of the model parameters is required. This

move can avoid missing the optimal point (jumping over the

oint) of the error function curve. A small derivative function

nds to a more flattened curve segment of the error function.

e, a greedy or greater update can happen to the model parameters.

ver, this simple update may cause oscillation, i.e. the over-update

p over the saddle point of the error function curve forward and